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In this paper, we propose new sequential methods for detecting port-scan attackers which 
routinely perform random "portscans" of IP addresses to find vulnerable servers to compro- 
mise. In addition to rigorously control the probability of falsely implicating benign remote 
Ph , hosts as malicious, our method performs significantly faster than other current solutions. More- 

over, our method guarantees that the maximum amount of observational time is bounded. In 
contrast to the previous most effective method, Threshold Random Walk Algorithm, which is 
^2 ■ explicit and analytical in nature, our proposed algorithm involve parameters to be determined 

by numerical methods. We have developed computational techniques such as iterative mini- 
max optimization for quick determination of the parameters of the new detection algorithm. 
A framework of multi- valued decision for testing portscanners is also proposed. 



1 Introduction 

As Internet becomes pervasive to our society, it is increasingly important to develop high perfor- 
mance network intrusion detection system (NIDS) to identify an attacker to allow for protective 
response to mitigate or fully prevent damage. An important need in such NIDS is prompt re- 
^ \ sponse: the sooner a NIDS detects malice, the lower the resulting damage. At the same time, a 

NIDS should not falsely implicate benign remote hosts as malicious [21 HIE]. There are many types 
of network intrusions. An extremely dangerous one is the "portscans" intrusion. A port-scan is 
an attack that sends client requests to a range of server port addresses on a host, with the goal 
of finding an active port and exploiting a known vulnerability of that service [7) [9| [TO] . 

In recent years, some detection schemes have been developed by virtue of statistical hypothesis 
testing. For example, the problem of detecting port-scan attacks has been addressed in the 
framework of testing a binomial parameter. In this direction, adaptive methods such as the 
Sequential Probability Ratio Tests jll| have been explored for fast detection of port-scan attacks. 
However, these techniques generally suffers from two drawbacks. First, the maximum number 
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of required observations is not deterministically bounded. Hence, there is a probability that the 
detection time is extremely long. Second, the existing detection algorithms usually attempt to 
be optimal for only a few parametric values and consequently the average performance for other 
parametric values many be very poor. In order to overcome these limitations, we propose a new 
methods for fast detection of port-scan attacks in the general framework of multistage tests of 
hypotheses. 

The remainder of the paper is organized as follows. In Section 2, we consider the problem of 
testing port-scan attack. In particular, we discuss the widely accepted binomial model and the 
threshold random walk detection algorithm. In Section 3, we introduce new sequential algorithm 
for detecting port-scan attacks. In Section 4, a framework of multi-valued decision for testing 
portscanners is proposed. Section 5 is the conclusion. 

2 Binomial Model 

A major characteristics of scanners is that they have higher chance than legitimate remote hosts 
to choose hosts which do not exist or do not have the requested service activated, since they lack 
precise knowledge of which hosts and ports on the target network are currently active [U El [TD] . 
Based on this observation, a detection problem has been formulated to provide the basis for an 
on-line algorithm whose goal is to reduce the number of observed connection attempts (compared 
to previous approaches) to flag malicious activity, while bounding the probabilities of missed 
detection and false detection. In this direction, a widely accepted model is the binomial model 
[U [9] described in the sequel. 

We shall adopt the description of [I] for the binomial model used for the detection of port-scan 
attacks. The activity that a remote source r makes a connection attempt to a local destination 
I can be considered as a random event. A frequent method to model such event is to classify 
the outcome of the attempt as either a "success" or a "failure" , where the latter corresponds to 
a connection attempt to an inactive host or to an inactive service on an otherwise active host. 
More formally, for a given r, let Xj be a random variable that represents the outcome of the first 
connection attempt by r to the i-th distinct local host, where 

{1 if the connection attempt is a success, 
(1) 
if the connection attempt is a failure 

As illustrated in [H [5], it is reasonable to assume that Xj, i = 1,2,- •• are independent and 
identically Bernoulli random variables such that 

Pr{X = 1} = 1 - Pr{X = 0} = p, 

where p £ (0, 1) is the success rate of making a connection. Usually, the success rate p is unknown 
and varying for different types of users. However, the success rate p of a scanner is normally very 
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low, while the success rate p of a benign user is high. By appropriate choosing values of threshold 
values po and p\ such that < po < pi < 1 based on empirical data analysis of relevant networks, 
the hypothesis that "the host is a scanner" can be formulated as J% ■ p < po- Similarly, the 
hypothesis that "the host is a benign user" can be formulated as 3%\ : p > p\. This amounts to 
the problem of testing statistical hypotheses 



based on Xj, i = 1,2, Throughout the remainder of this paper, let Pr{E \ p} denote the 
probability of event E associated with p. To control the probabilities of making wrong decisions, 
it is typically required that 



Pr{Reject Jgg | p} < a for all p G (0,p ], Pr{Reject J0[ \ p} < /3 for all p <G [pi, 1) (2) 



where a, ft € (0, 1) are some pre-specified numbers. In order to minimize the potential damage of 
network intrusion and control the probability of false alarm, it is desirable to make this detection 
as quickly as possible, but with a high probability of being correct. The above formulation of the 
port-scanner detection problem has been proposed by a number of researchers and many detection 
algorithms have been developed. One of the most effective algorithms for early scan detection is 
the Threshold Random Walk Algorithm (TRWA) developed in [HE]) which is represented in the 
following section. 

3 Threshold Random Walk Algorithm 

The widely cited Threshold Random Walk Algorithm [3] is derived from the famous Sequential 
Probability Ratio Test (SPRTs) invented by Abraham Wald [11] in the War time in response to 
the demand of efficient testing of ammunition power. Define relative frequency p n = ^ — - for 
n = 1, 2, • • • . The idea of TRWA is to continuously observe the probability ratio 



for n = 1, 2, • • • . The observational process is continued until p'{^--- [xltl] ^ k o or pllxl','--- [xltl] ^ 
k\ for some positive integer n, where ko < ki are two pre-specified positive integers for controlling 
the probability of making wrong decisions. At the termination of the observational process, a 
decision is made as follows: 

If Prlxi'-.-^lpi} ^ fc o> then declare the source r as a benign user. If p'.|^^ > k u then 
declare the source r as a scanner. 

It can be shown that TRWA has the following properties: IfO</co = a<l<^ = fci, then 
the TRWA ensures the risk requirement ([2]). Moreover, the average number of observations is 
minimized for both po and p\ among all possible tests such that PrjReject J^o I Po} ^ a an d 
Pr{Reject M{\pi}<(3. 



•J^b '■ P < Po versus J^i : p > p\ 
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Despite its remarkable simplicity and optimality for threshold values, the TRWA has the fol- 
lowing major drawbacks. First, the number of observations is not bounded by a deterministic 
number. In the extreme case, the detection time can be unacceptably long. Second, as a conse- 
quence of the fact that TRWA is optimal when the true success rate p assumes value po or p\ , the 
average performance can be very poor when the true rate of success differs from po and p\. Since 
the choice of threshold values po and p\ is based on empirical data analysis and is thus some what 
arbitrary, the performance of the detection algorithm is important for p taking values different 
from po and p\. To overcome these drawbacks, we propose to develop a detection method in the 
next section. 



4 New Detection Algorithm 

Our new detection algorithm depends on 3 positive parameters a, b and £, which are to be deter- 
mined by a computational method to guarantee the risk requirement. The parameter £ is called 
the risk tuning parameter. The parameters a and b are referred to as weighting coefficients. Let 
the relative frequency p n be defined as before. For the ease of describing our detection algorithm, 
define new random variables 
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for p n = 0, 
for < p n < 1, 
for p n = l 

for p n = 0, 
for < p n < 1, 
for p n = 1 



for n = 1, 2, • • • . We are now in a position to state the stopping and decision rules of our detection 
algorithm in the sequel. 



4.1 Stopping and Decision Rules 

Assume that the risk tuning parameter and weighting coefficients can be determined to satisfy 
the risk requirement ([2]), our detection algorithm can be described as follow. 

Continue taking observations until Y n > ^ln^, p n > p or Z n > ^ln^, p n < pi for some 
positive integer n. At the termination of observational process, make the following decision: If 
Zn > ^ In ^g, p n < pi, then declare the source r as a scanner. IfY n > ^ln^, p n >Po, then declare 
the source r as a benign user. 

For po = 0.2, pi = 0.8, our stopping and decision rules with £ = 1 and a = b = 0.1 can be 
shown by Figure [TJ The lower shaded area represents the acceptance region of ifflo- The upper 
shaded area represents the rejection region of M?o- The blue line with star symbols represents 
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a sample path. The observational process is continued until the sample path hit either the 
acceptance region of rejection region of J^o. If the sample path hits the acceptance region of J^o, 
then declare that r is a scanner. If the sample path hits the rejection region of Mq, then declare 
that r is a benign user. 




23456789 10 11 
Number of observations 



Figure 1: An illustration of new detection algorithm 

4.2 Determination of Risk Tuning Parameter and Weighting Coefficients 

Given that our detection algorithm can be parameterized as in Section 14.11 we need to determine 
the risk tuning parameter £ and weighting coefficients a, b so that the required number of obser- 
vations is as small as possible, while guaranteeing the risk requirement ([2]). The computational 
process for accomplishing this task is called risk tuning. Clearly, the risk requirement is satisfied if 
C is sufficiently small. This implies that if the weighting coefficients are given, one can determine 
the risk tuning parameter £ to meet the risk requirement by the following two steps: First, find 
the maximum number, £, in the set {2~* : i € N}, where N is the set of natural numbers, such that 
the risk requirement is satisfied when the risk tuning parameter £ assumes value £. Second, apply 
a bisection search method to obtain a number as large as possible from interval [£, 2£) such 
that the risk requirement is satisfied when the risk tuning parameter £ assumes value £*. How- 
ever, these two steps are not sufficient to produces detection algorithm of satisfactory efficiency 
if the weighting coefficients are not properly chosen. To overcome this limitation, we observe 
that to make a detection algorithm efficient, it is an effective approach to make the detection 
algorithm efficient when the success rate p assumes values po and p\. This is a consequence of 
the fact that Pr{ Accept J^o | p} is non- increasing with respect to p € (0, 1). Due to the mono- 
tonicity of the operating characteristic function, it suffices to ensure PrjReject J£q | po} < a and 



5 



PrjReject M\ | pi} < /3 to satisfy the risk requirement ([2]). Define 
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a 



B = 



Pr{Reject J% \ po} 



Pr {Reject J0[ \ Pl } 



Q = max {A, B} 



R = mhi {A, B} 



as functions of a, b and For purpose of developing an efficient detection algorithm satisfying 
the risk requirement, we propose to determine risk tuning parameter £ and weighting coefficients 
a, b such that Q is minimized under the constraint that R is no less than 1. This task can be 
accomplished by applying the iterative minimax optimization algorithm described as follows. 

V Set the maximum number of iterations as k max . Choose the initial values of 
weighting coefficients as a = a and b = f3. Let Q <— oo and k ^— 0. 

V While k < k max , do the following: 



o If Q* < Q, then let a <- (*a, b <- (*b and Q <- Q* . If A* = Q* , then let 
a <- C*a(l + ^i). If 5* = Q*, then let b <- (*b(l + ^i). Let fc <- jb + 1. 
V Return ^ = 1 as the desired risk tuning parameter and a, b as 
the weighting coefficients. 

The intuition behind this algorithm is that Pr{Reject .3% \ po} and PrjReject 3%\ \ pi} are 
"roughly" increasing with respect to a and b, respectively, when the risk tuning parameter £ is 
fixed. 

In the execution of the algorithm, we need to compute the probabilistic terms like PrjReject J%q 
Po} and Pr{Reject J%{ \ p±}. These quantities can be computed by the path counting method of 
[2] or the recursive algorithm of [8]. 

4.3 Maximum Number of Observations 

One salient feature of the above algorithm is that the maximum number of observations is ab- 
solutely bounded. Moreover, the maximum number is the least integer no less than m which 
satisfies the following equations: 



o Use a bisection search method to determine a number > as large as 
possible for £ such that the value of R associated with a, b and is no 
less than 1. Let A*, B* and Q* respectively denote the corresponding 
values oi A, B and Q. 




where z € (poj.Pi)- To solve the above equations for m, we first eliminate m and obtain 
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from which we find the root z = z* by a bisection search method. Afterward, we substitute 
z = z* into the first equation to obtain the corresponding m = m*. Then, the maximum number 
of observations is equal to [m*\ + 1. It should be noted that, in the special case of a = b, we have 



(1 - z) In 



1 -Po 
I- Pi 



z In 



Pi 



0. 



from which we obtain 



i-pi 
In ( > 1 - po \ pi 



and a closed-formed formula for m* 



4.4 Comparison with TRWA 

We have conducted numerical experiments for comparing our detection scheme with TRWA. For 
the case of po = 0.1, p\ = 0.15 and a = (3 = 0.1, the risks of our detection scheme (with 
£ = 0.96, a = b = 0.1) and TRWA are respectively shown by the blue and green plots in Figure 
[2 With the same configuration, the ratio between the average number of observations of our 
detection algorithm to that of TRWA is shown in Figure El Our computation shows that the new 
detection algorithm requires a much smaller number of connection attempts to detect a scanner 
as compared to TRWA. 



5 Mult i- Valued Decision 

As can be seen from the risk requirement ([2]), there is no specification imposed for users with 
success rate p € (po,Pi)- This implies that those users can be arbitrarily classified as either 
scanners or benign users. In applications, po is usually chosen as a number close to 0, while p\ 
is chosen as a number close to 1. Therefore, there exists a wide gap between po and p±. This 
indicates that there is a large portion of "marginal" users being cast into either the category 
of scanners or benign users. In view of this situation, we propose to classify the users as three 
categories: scanner, marginal, and benign. Specifically, let po and p\ be two threshold values such 
that < po < pi < 1. We propose to test the following three hypotheses: 

J%:p<p , J^i : po < P < Pi, ^2:p>Pi 

where hypotheses J%, Jt{ and corresponds to the categories of "scanner", "marginal", and 
"benign". Based on the classification, different actions are taken for the corresponding categories. 



7 




0.4 



0.6 



Figure 2: Comparison of risks 

To control the probabilities of making wrong decisions, we impose the following requirement: 

Pr{Reject I p} < S 
Pr{Reject Jg{ \ p} < 8 X 
Pr{Reject Jft I p) < h 

where < p' < p Q < p'^ < p[ < pi < p'{ < 1 and Si G (0, 1) for i = 1, 2, 3. The intervals (p' ,Pq) 
and (p'i,Pi) are called indifference zones, since no specification is imposed for controlling the 
probability of making wrong decisions for p contained in these intervals. This problem is actually 
a special case of the general problem of testing multiple hypotheses, which has been systematically 
addressed in our recent paper PQ. The techniques in [1] offer a complete solution to the present 
problem of testing triple hypotheses on the success rate p. As an illustration, assume that 

So = 8! = 5 2 = 0.1, 

1 2 

Po = 3, Pi = 3, 

and 

/ 1 1 / 1 // 1 

P0 = P0-g, P0 = P0+g, Pl=Pl~g, P^P^y' 

By virtue of the technique of [T], we have obtained a sequential testing scheme shown by Figure 
[31 where the bottom, middle and upper shaded areas represent the acceptance regions of J%o,J%i 
and Jt?2, respectively. The stopping and decision rules can be stated as follows: 

If the sample path, which can be represented by the plot of the relative frequency p n versus 
the number n of observations, hits a shaded region, then terminate the observational process. At 



for p G (0,Po\, 
for p G [pd.p'i], 
iovp G \p'{, 1) 
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Figure 3: Comparison of average number of observations 

the termination of the observational process, accept the hypothesis of which the acceptance region 
is hit by the sample path. 

In Figure we plot the risk, Pr{The decision is incorrect | p}, versus the success rate p. It 
can be seen that the risk requirement is satisfied for any p £ (0, 1) not contained in the indifference 
zones. 

6 Conclusion 

We have developed new sequential methods for detecting portscanners. In addition to guaran- 
teeing the risk requirement, our algorithm is efficient when the success rate assumes values other 
than the threshold values. Moreover, the required number of observations is absolutely bounded. 
Furthermore, we have proposed a framework of multi-valued decision for testing portscanners. 
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